Graph-Based, Supervised Machine Learning Approach to (Irregular) Polysemy in WordNet
نویسنده
چکیده
This paper presents a supervised machine learning approach that aims at annotating those homograph word forms in WordNet that share some common meaning and can hence be thought of as belonging to a polysemous word. Using different graph-based measures, a set of features is selected, and a random forest model is trained and evaluated. The results are compared to other features used for polysemy identification in WordNet. The features proposed in this paper not only outperform the commonly used CoreLex resource, but they also work on different parts of speech and can be used to identify both regular and irregular polysemous word forms in WordNet.
منابع مشابه
Word Sense vs. Word Domain Disambiguation: A Maximum Entropy Approach
In this paper, a supervised learning system of word sense disambiguation is presented. It is based on conditional maximum entropy models. This system acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of features. The system were evaluated both using WordNet’s senses and domains as the sets of classes of each word. Domain labels are obtained...
متن کاملUniversity_Of_Sheffield: Two Approaches to Semantic Text Similarity
This paper describes the University of Sheffield’s submission to SemEval-2012 Task 6: Semantic Text Similarity. Two approaches were developed. The first is an unsupervised technique based on the widely used vector space model and information from WordNet. The second method relies on supervised machine learning and represents each sentence as a set of n-grams. This approach also makes use of inf...
متن کاملEmotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملA Methodology for Word Sense Disambiguation at 90% based on large-scale CrowdSourcing
Word Sense Disambiguation has been stuck for many years. In this paper we explore the use of large-scale crowdsourcing to cluster senses that are often confused by non-expert annotators. We show that we can increase performance at will: our in-domain experiment involving 45 highly polysemous nouns, verbs and adjective (9.8 senses on average), yields an average accuracy of 92.6 using a supervise...
متن کاملSpanish All-Words Semantic Class Disambiguation Using Cast3LB Corpus
In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many...
متن کامل